Interpreting Unit Segmentation of Conversational Speech in Simultaneous Interpretation Corpus
نویسندگان
چکیده
The speech-to-speech translation system is becoming an important research topic with the progress of the speech and language processing technology. Considering efficiency and the smoothness of the cross-lingual conversation, the simultaneity of the translation processing has a great influence on the performance of the system. This paper describes interpreting unit segmentation of conversational bilingual speech in simultaneous interpretation corpus which has been developed in Nagoya University. By finding the segmentation point of spoken utterances in the speech corpus manually, we identified a clause-unit as a practical interpreting unit. In this paper, we examined the availability of such unit, and segmented spoken dialogue sentences into interpreting units. A large-scale bilingual corpus for which the interpreting units are provided can be used for the simultaneous machine interpretation.
منابع مشابه
Construction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation
Abstract With the development of speech and language processing, speech translation systems have been developed. These studies target spoken dialogues, and employ consecutive interpretation, which uses a sentence as the translation unit. On the other hand, there exist a few researches about simultaneous interpreting, and recently, the language resources for promoting simultaneous interpreting r...
متن کاملConstruction and utilization of bilingual speech corpus for simultaneous machine interpretation research
This paper describes the design, analysis and utilization of a simultaneous interpretation corpus. The corpus has been constructed at the Center for Integrated Acoustic Information Research (CIAIR) of Nagoya University in order to promote the realization of the multi-lingual communication supporting environment. The size of transcribed data is about 1 million words, and the corpus would deserve...
متن کاملCollection of Simultaneous Interpreting Patterns by Using Bilingual Spoken Monologue Corpus
This paper provides an investigation of simultaneous interpreting patterns using a bilingual spoken monologue corpus. 4,578 pairs of English-Japanese aligned utterances in CIAIR simultaneous interpretation database were used. This investigation is the largest scale as the observation of simultaneous interpreting speech. The simultaneous interpreters are required to generate the target speech si...
متن کاملBilingual Spoken Monologue Corpus for Simultaneous Machine Interpretation Research
Abstract This paper describes a large-scale bilingual corpus of spoken monologues and their simultaneous interpretation, which has been constructed at CIAIR. The corpus has the following characteristics: (1) English and Japanese speeches are recorded in parallel, (2) the data contains monologue speeches such as lecture and self-introduction, and (3) the exact beginning and ending times are prov...
متن کاملRecognition and Understanding of Meetings
This paper is about interpreting human communication in meetings using audio, video and other signals. Automatic meeting recognition and understanding is extremely challenging, since communication in a meeting is spontaneous and conversational, and involves multiple speakers and multiple modalities. This leads to a number of significant research problems in signal processing, in speech recognit...
متن کامل